choice model
Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Yining Wang, Xi Chen, Yuan Zhou
In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of Op? T log log T q, which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.
- North America > United States > Illinois (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
- North America > Canada (0.04)
- (2 more...)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Lebanon (0.04)
From Small to Large: A Graph Convolutional Network Approach for Solving Assortment Optimization Problems
Li, Guokai, Gao, Pin, Jasin, Stefanus, Wang, Zizhuo
Assortment optimization seeks to select a subset of substitutable products, subject to constraints, to maximize expected revenue. The problem is NP-hard due to its combinatorial and nonlinear nature and arises frequently in industries such as e-commerce, where platforms must solve thousands of such problems each minute. We propose a graph convolutional network (GCN) framework to efficiently solve constrained assortment optimization problems. Our approach constructs a graph representation of the problem, trains a GCN to learn the mapping from problem parameters to optimal assortments, and develops three inference policies based on the GCN's output. Owing to the GCN's ability to generalize across instance sizes, patterns learned from small-scale samples can be transferred to large-scale problems. Numerical experiments show that a GCN trained on instances with 20 products achieves over 85% of the optimal revenue on problems with up to 2,000 products within seconds, outperforming existing heuristics in both accuracy and efficiency. We further extend the framework to settings with an unknown choice model using transaction data and demonstrate similar performance and scalability.
- North America > United States > Michigan (0.28)
- Asia > China (0.28)
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this approach overlooks the opportunity to learn from richer forms of human feedback, such as multiwise comparisons and top-$k$ rankings. We propose Ranked Choice Preference Optimization (RCPO), a unified framework that bridges preference optimization with (ranked) choice modeling via maximum likelihood estimation. The framework is flexible, supporting both utility-based and rank-based choice models. It subsumes several existing pairwise methods (e.g., DPO, SimPO), while providing principled training objectives for richer feedback formats. We instantiate this framework with two representative ranked choice models (Multinomial Logit and Mallows-RMJ). Empirical studies on Llama-3-8B-Instruct and Gemma-2-9B-it across AlpacaEval 2 and Arena-Hard benchmarks show that RCPO consistently outperforms competitive baselines. RCPO shows how directly leveraging ranked preference data, combined with the right choice models, yields more effective alignment. It offers a versatile and extensible foundation for incorporating (ranked) choice modeling into LLM training.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Vietnam > Long An Province > Tân An (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)